We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation in online interactive decision making. In specific, GEC captures the hardness of exploration by comparing the error of predicting the performance of the updated policy with the in-sample training error evaluated on the historical data. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable POMDPs. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashion, under both fully observable and partially observable settings. The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning. We prove that the proposed algorithm is sample efficient by establishing a sublinear regret upper bound in terms of GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.
translated by 谷歌翻译
与多标签学习相反,标签分布学习通过标签分布来表征示例的多义,以代表更丰富的语义。在标签分布的学习过程中,培训数据主要是通过手动注释或标签增强算法来生成标签分布的。不幸的是,手动注释任务的复杂性或标签增强算法的不准确性导致标签分布训练集中的噪声和不确定性。为了减轻此问题,我们在标签分布学习框架中介绍了隐式分布,以表征每个标签值的不确定性。具体而言,我们使用深层隐式表示学习来构建具有高斯先前约束的标签分布矩阵,其中每个行组件对应于每个标签值的分布估计,并且该行组件受到先验的高斯分布来限制以调节噪声和不确定性标签分布数据集的干扰。最后,通过使用自我注意力算法将标签分布矩阵的每个行分量转换为标准标签分布形式。此外,在训练阶段进行了一些具有正则化特征的方法,以提高模型的性能。
translated by 谷歌翻译
低光图像增强功能是一个经典的计算机视觉问题,旨在从低光图像中恢复正常暴露图像。但是,该领域常用的卷积神经网络擅长对空间结构域中的低频局部结构特征进行取样,从而导致重建图像的纹理细节不清楚。为了减轻这个问题,我们建议使用傅立叶系数进行新的模块,该模块可以在频率阶段的语义约束下恢复高质量的纹理细节并补充空间域。此外,我们使用带有不同接收场的扩张卷积为图像空间域设计了一个简单有效的模块,以减轻频繁下采样引起的细节损失。我们将上述部分集成到端到端的双分支网络中,并设计一个新颖的损失委员会和一个自适应融合模块,以指导网络灵活地结合空间和频域特征,以产生更令人愉悦的视觉效果。最后,我们在公共基准上评估了拟议的网络。广泛的实验结果表明,我们的方法的表现优于许多现有的最先进的结果,表现出出色的性能和潜力。
translated by 谷歌翻译
当前,基于变压器的算法正在在图像脱张的域中引起飞溅。它们的成就取决于CNN茎的自我发挥机制,以模拟令牌之间的长距离依赖性。不幸的是,这种令人愉悦的管道引入了较高的计算复杂性,因此很难实时在单个GPU上运行超高定义图像。为了取消准确性和效率,在没有自我注意力的机制的情况下,在三维($ c $,$ w $和$ h $)信号的三维($ c $,$ w $和$ h $)信号上周期性计算的输入降级图像进行了计算。我们将此深层网络称为多尺度立方混合物,在快速傅立叶变换后,它在真实和虚构的组件上都作用,以估计傅立叶系数,从而获得脱毛的图像。此外,我们将多尺度立方混合物与切片策略相结合,以低得多的计算成本产生高质量结果。实验结果表明,所提出的算法对几个基准的最先进的脱蓝色方法和在精度和速度方面的新超高定义数据集有利。
translated by 谷歌翻译
我们从一组未配对的清晰和朦胧的图像中提供了实用的基于学习的图像飞行网络。本文提供了一种新的观点,可以将图像除去作为两类分离的因子分离任务,即清晰图像重建的任务相关因素以及与雾霾相关的分布的任务含量。为了在深度特征空间中实现这两类因素的分离,将对比度学习引入了一个自行车框架中,以通过指导与潜在因素相关的生成的图像来学习分离的表示形式。通过这种表述,提出的对比度拆除的脱掩护方法(CDD-GAN)采用负面发电机与编码器网络合作以交替进行更新,以产生挑战性负面对手的队列。然后,这些负面的对手是端到端训练的,以及骨干代表网络,以通过最大化对抗性对比损失来增强歧视性信息并促进因素分离性能。在培训期间,我们进一步表明,硬性负面例子可以抑制任务 - 无关紧要的因素和未配对的清晰景象可以增强与任务相关的因素,以便更好地促进雾霾去除并帮助图像恢复。对合成和现实世界数据集的广泛实验表明,我们的方法对现有的未配对飞行基线的表现良好。
translated by 谷歌翻译
深度加强学习(DRL)在游戏和机器人控制等应用中彻底改变了学习和致动。数据收集的成本,即从代理环境互动产生转变,仍然是在复杂的现实问题中更广泛的DRL采用的重大挑战。在GPU云平台上培训DRL代理的云原生范例是一个有前途的解决方案。在本文中,我们为云天然深层加固学习提供了一种可扩展和弹性图书馆优雅的钢茶,其有效地支持数百万GPU核心,以便在多个层面进行大规模平行的训练。在一个高级别的优雅普罗拉科尔使用基于锦标赛的集合计划,以协调数百个甚至数千个GPU的培训过程,安排排行榜与培训池与数百个豆荚之间的相互作用。在低级,每个POD通过在单个GPU中充分利用近7,000个GPU CUDA核心,模拟了代理环境的交互。我们的优雅RL-Podracer Library通过遵循集装箱,微服务和MLOPS的开发原则,具有高可扩展性,弹性和可访问性。使用NVIDIA DGX SuperPod Cloud,我们对机器人和股票交易中的各种任务进行了广泛的实验,并表明Elegitrl-Podracer大大优于Rllib。我们的代码可在GitHub上获得。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译